5 research outputs found

    No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

    Full text link
    Existing online learning algorithms for adversarial Markov Decision Processes achieve O(T){O}(\sqrt{T}) regret after TT rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys O~(T+CP)\widetilde{{O}}(\sqrt{T} + C^{\textsf{P}}) regret where CPC^{\textsf{P}} measures how adversarial the transition functions are and can be at most O(T){O}(T). While this algorithm itself requires knowledge of CPC^{\textsf{P}}, we further develop a black-box reduction approach that removes this requirement. Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in Jin et al. [2021]) and achieves O~(U+UCL+CP)\widetilde{{O}}(U + \sqrt{UC^{\textsf{L}}} + C^{\textsf{P}}) regret, where UU is some standard gap-dependent coefficient and CLC^{\textsf{L}} is the amount of corruption on losses.Comment: 66 page

    An algorithm for stochastic and adversarial bandits with switching costs

    No full text
    We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price λ\lambda every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of O((λK)1/3T2/3+KT)O\big((\lambda K)^{1/3}T^{2/3} + \sqrt{KT}\big), where TT is the time horizon and KK is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of O(((λK)2/3T1/3+ln⁥T)∑i≠i∗Δi−1)O\left(\big((\lambda K)^{2/3} T^{1/3} + \ln T\big)\sum_{i \neq i^*} \Delta_i^{-1}\right), where Δi\Delta_i are the suboptimality gaps and i∗i^* is a unique optimal arm. In the special case of λ=0\lambda = 0 (no switching costs), both bounds are minimax optimal within constants. We also explore variants of the problem, where switching cost is allowed to change over time. We provide experimental evaluation showing competitiveness of our algorithm with the relevant baselines in the stochastic, stochastically constrained adversarial, and adversarial regimes with fixed switching cost

    Correlation between DNA Methylation and cell proliferation identifies new candidate predictive markers in meningioma

    No full text
    International audienceMeningiomas are the most common primary tumors of the central nervous system. Based on the 2021 WHO classification, they are classified into three grades reflecting recurrence risk and aggressiveness. However, the WHO’s histopathological criteria defining these grades are somewhat subjective. Together with reliable immunohistochemical proliferation indices, other molecular markers such as those studied with genome-wide epigenetics promise to revamp the current prognostic classification. In this study, 48 meningiomas of various grades were randomly included and explored for DNA methylation with the Infinium MethylationEPIC microarray over 850k CpG sites. We conducted differential and correlative analyses on grade and several proliferation indices and markers, such as mitotic index and Ki-67 or MCM6 immunohistochemistry. We also set up Cox proportional hazard models for extensive associations between CpG methylation and survival. We identified loci highly correlated with cell growth and a targeted methylation signature of regulatory regions persistently associated with proliferation, grade, and survival. Candidate genes under the control of these regions include SMC4, ESRRG, PAX6, DOK7, VAV2, OTX1, and PCDHA-PCDHB-PCDHG, i.e., the protocadherin gene clusters. This study highlights the crucial role played by epigenetic mechanisms in shaping dysregulated cellular proliferation and provides potential biomarkers bearing prognostic and therapeutic value for the clinical management of meningioma

    La qualité environnementale en milieu urbain

    No full text
    Ce numĂ©ro de la revue MĂ©diterranĂ©e rassemble des Ă©tudes en gĂ©ographie, Ă©cologie, urbanisme, paysage
 qui interrogent la qualitĂ© environnementale du cadre de vie dans 15 villes d’Europe et du Bassin mĂ©diterranĂ©en. Il propose de mettre en question les approches normatives qui sous-tendent aujourd'hui les produits immobiliers ou urbanistiques qui mobilisent la notion de « haute qualitĂ© environnementale », ainsi que le discours consensuel sur la « ville durable »  Le numĂ©ro vise d’abord Ă  enrichir les critĂšres d'Ă©valuation classiques de la QE urbaine qui habituellement procĂšdent « top down » et par indices ou combinaisons multicritĂšres. Pour ce faire, les articles mettent en relation les paramĂštres matĂ©riels et mesurables de la qualitĂ© environnementale des amĂ©nagements urbains (par exemple : vĂ©gĂ©tation urbaine, bruit/calme, pollution, confort thermique etc.) avec les perceptions sensibles et Ă©valuations subjectives des habitants, ainsi qu'avec les enjeux politico-Ă©conomiques de la relation Ă  l’environnement. L'objectif est de comprendre la construction sociale de « systĂšmes de valeurs environnementales » dans leur globalitĂ©, en tenant compte de leurs enjeux politiques, des rapports socio-Ă©conomiques, et des pratiques, reprĂ©sentations, ou revendications territoriales des habitants dans les contextes analysĂ©s
    corecore